NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TiGAN: Text-Based Interactive Image Generation and Manipulation

https://doi.org/10.1609/aaai.v36i3.20270

Zhou, Yufan; Zhang, Ruiyi; Gu, Jiuxiang; Tensmeyer, Chris; Yu, Tong; Chen, Changyou; Xu, Jinhui; Sun, Tong (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Using natural-language feedback to guide image generation and manipulation can greatly lower the required efforts and skills. This topic has received increased attention in recent years through refinement of Generative Adversarial Networks (GANs); however, most existing works are limited to single-round interaction, which is not reflective of real world interactive image editing workflows. Furthermore, previous works dealing with multi-round scenarios are limited to predefined feedback sequences, which is also impractical. In this paper, we propose a novel framework for Text-based Interactive image generation and manipulation (TiGAN) that responds to users' natural-language feedback. TiGAN utilizes the powerful pre-trained CLIP model to understand users' natural-language feedback and exploits contrastive learning for a better text-to-image mapping. To maintain the image consistency during interactions, TiGAN generates intermediate feature vectors aligned with the feedback and selectively feeds these vectors to our proposed generative model. Empirical results on several datasets show that TiGAN improves both interaction efficiency and image quality while better avoids undesirable image manipulation during interactions.
more » « less
Full Text Available
LAFITE: Towards Language-Free Training for Text-to-Image Generation

Zhou, Yufan; Zhang, Ruiyi; Chen, Changyou; Li, Chunyuan; Tensmeyer, Chris; Yu, Tong; Gu, Jiuxiang; Xu, Jinhui; Sun, Tong (June 2022, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
ICPR 2020 - Competition on Harvesting Raw Tables from Infographics

https://doi.org/10.1007/978-3-030-68793-9_27

Davila, Kenny; Tensmeyer, Chris; Shekhar, Sumit; Singh, Hrituraj; Setlur, Srirangaraj; Govindaraju, Venu. (February 2021, Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021)
Del Bimbo, Alberto; Cucchiara, Rita; Sclaroff, Stan; Farinella, Giovanni M; Mei, Tao; Bertini, Marc; Escalante, Hugo J; Vezzani, Roberto. (Ed.)
This work summarizes the results of the second Competition on Harvesting Raw Tables from Infographics (ICPR 2020 CHART-Infographics). Chart Recognition is difficult and multifaceted, so for this competition we divide the process into the following tasks: Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided two sets of datasets for training and evaluation of the participant submissions. The first set is based on synthetic charts (Adobe Synth) generated from real data sources using matplotlib. The second one is based on manually annotated charts extracted from the Open Access section of the PubMed Central (UB PMC). More than 25 teams registered out of which 7 submitted results for different tasks of the competition. While results on synthetic data are near perfect at times, the same models still have room to improve when it comes to data extraction from real charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
more » « less
Full Text Available

Search for: All records